Parametric v non-parametric methods for data analysis.

نویسندگان

  • Douglas G Altman
  • J Martin Bland
چکیده

Continuous data arise in most areas of medicine. Familiar clinical examples include blood pressure, ejection fraction, forced expiratory volume in 1 second (FEV1), serum cholesterol, and anthropometric measurements. Methods for analysing continuous data fall into two classes, distinguished by whether or not they make assumptions about the distribution of the data. Theoretical distributions are described by quantities called parameters, notably the mean and standard deviation.1 Methods that use distributional assumptions are called parametric methods, because we estimate the parameters of the distribution assumed for the data. Frequently used parametric methods include t tests and analysis of variance for comparing groups, and least squares regression and correlation for studying the relation between variables. All of the common parametric methods (“t methods”) assume that in some way the data follow a normal distribution and also that the spread of the data (variance) is uniform either between groups or across the range being studied. For example, the two sample t test assumes that the two samples of observations come from populations that have normal distributions with the same standard deviation. The importance of the assumptions for t methods diminishes as sample size increases. Alternative methods, such as the sign test, MannWhitney test, and rank correlation, do not require the data to follow a particular distribution. They work by using the rank order of observations rather than the measurements themselves. Methods which do not require us to make distributional assumptions about the data, such as the rank methods, are called non-parametric methods. The term non-parametric applies to the statistical method used to analyse data, and is not a property of the data.1 As tests of significance, rank methods have almost as much power as t methods to detect a real difference when samples are large, even for data which meet the distributional requirements. Non-parametric methods are most often used to analyse data which do not meet the distributional requirements of parametric methods. In particular, skewed data are frequently analysed by non-parametric methods, although data transformation can often make the data suitable for parametric analyses.2 Data that are scores rather than measurements may have many possible values, such as quality of life scales or data from visual analogue scales, while others have only a few possible values, such as Apgar scores or stage of disease. Scores with many values are often analysed using parametric methods, whereas those with few values tend to be analysed using rank methods, but there is no clear boundary between these cases. To compensate for the advantage of being free of assumptions about the distribution of the data, rank methods have the disadvantage that they are mainly suited to hypothesis testing and no useful estimate is obtained, such as the average difference between two groups. Estimates and confidence intervals are easy to find with t methods. Non-parametric estimates and confidence intervals can be calculated, however, but depend on extra assumptions which are almost as strong as those for t methods.3 Rank methods have the added disadvantage of not generalising to more complex situations, most obviously when we wish to use regression methods to adjust for several other factors. Rank methods can generate strong views, with some people preferring them for all analyses and others believing that they have no place in statistics. We believe that rank methods are sometimes useful, but parametric methods are generally preferable as they provide estimates and confidence intervals and generalise to more complex analyses. The choice of approach may also be related to sample size, as the distributional assumptions are more important for small samples. We consider the analysis of small data sets in a subsequent Statistics Notes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the efficiency of Iranian industrial universities based on non-parametric and parametric approaches

The present study is the efficiency of Iranian industrial universities using non-parametric methods of data envelopment analysis and random border analysis parameter for input variables (number of incoming students, number of faculty members, number of staff and budget) and output (specific income, Has evaluated the number of students studying, the number of graduates and conference papers) and...

متن کامل

Regression Modeling for Spherical Data via Non-parametric and Least Square Methods

Introduction Statistical analysis of the data on the Earth's surface was a favorite subject among many researchers. Such data can be related to animal's migration from a region to another position. Then, statistical modeling of their paths helps biological researchers to predict their movements and estimate the areas that are most likely to constitute the presence of the animals. From a geome...

متن کامل

Comparison of Parametric and Non-parametric EEG Feature Extraction Methods in Detection of Pediatric Migraine without Aura

Background: Migraine headache without aura is the most common type of migraine especially among pediatric patients. It has always been a great challenge of migraine diagnosis using quantitative electroencephalography measurements through feature classification. It has been proven that different feature extraction and classification methods vary in terms of performance regarding detection and di...

متن کامل

Stochastic Non-Parametric Frontier Analysis

In this paper we develop an approach that synthesizes the best features of the two main methods in the estimation of production efficiency. Specically, our approach first allows for statistical noise, similar to Stochastic frontier analysis, and second, it allows modeling multiple-inputs-multiple-outputs technologies without imposing parametric assumptions on production relationship, similar to...

متن کامل

A comparison of parametric and non-parametric methods of standardized precipitation index (SPI) in drought monitoring (Case study: Gorganroud basin)

The Standardized Precipitation Index (SPI) is the most common index for drought monitoring. Although the calculation of this index is usually done by using the gamma distribution fitting of precipitation data, studies have shown that for accurate monitoring of drought, the optimal distribution of precipitation in each month should be determined. On the other hand, in non-stationary time series,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • BMJ

دوره 338  شماره 

صفحات  -

تاریخ انتشار 2009